Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to controllers tracking reconciliations #154

Closed
wants to merge 3 commits into from

Conversation

mh013370
Copy link
Member

@mh013370 mh013370 commented Aug 25, 2022

Q A
Bug fix? no
New feature? yes
API breaks? no
Deprecations? no
Related tickets fixes #150
License Apache 2.0

What's in this PR?

I've added prometheus metrics capturing all of the following:

  • Go process runtime metrics (cpu, memory, etc)
  • Go build info
  • Per-controller reconciliation metrics tracking # successes, # errors, duration of reconciliation loops, and whether or not the controller is ready

Why?

To improve observability of the nifikop operator

Checklist

  • Implementation tested
  • Error handling code meets the guideline
  • Logging code meets the guideline
  • User guide and development docs updated (if needed)
  • Append changelog with changes

@mh013370
Copy link
Member Author

I just need to deploy this and verify the metrics are getting set correctly, but this is ready for review.

@mh013370
Copy link
Member Author

mh013370 commented Aug 25, 2022

I just need to deploy this and verify the metrics are getting set correctly, but this is ready for review.

Confirmed that these metrics are now reported by nifikop:

root ➜ /workspace $ curl localhost:8080/metrics|grep nifikop
# HELP nifikop_operator_ready 1 when the controller is ready to reconcile resources, 0 otherwise
# TYPE nifikop_operator_ready gauge
nifikop_operator_ready{controller="NifiCluster"} 1
nifikop_operator_ready{controller="NifiClusterTask"} 1
nifikop_operator_ready{controller="NifiDataflow"} 1
nifikop_operator_ready{controller="NifiNodeGroupAutoscaler"} 1
nifikop_operator_ready{controller="NifiParameterContext"} 1
nifikop_operator_ready{controller="NifiRegistryClient"} 1
nifikop_operator_ready{controller="NifiUser"} 1
nifikop_operator_ready{controller="NifiUserGroup"} 1
# HELP nifikop_operator_reconcile_duration_seconds Histogram of reconcile operations
# TYPE nifikop_operator_reconcile_duration_seconds histogram
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="0.1"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="0.5"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="1"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="5"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="10"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiCluster",le="+Inf"} 0
nifikop_operator_reconcile_duration_seconds_sum{controller="NifiCluster"} 0
nifikop_operator_reconcile_duration_seconds_count{controller="NifiCluster"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiClusterTask",le="0.1"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiClusterTask",le="0.5"} 0
........
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiUserGroup",le="10"} 0
nifikop_operator_reconcile_duration_seconds_bucket{controller="NifiUserGroup",le="+Inf"} 0
nifikop_operator_reconcile_duration_seconds_sum{controller="NifiUserGroup"} 0
nifikop_operator_reconcile_duration_seconds_count{controller="NifiUserGroup"} 0
# HELP nifikop_operator_reconcile_errors_total Number of errors that occurred during reconcile operations
# TYPE nifikop_operator_reconcile_errors_total counter
nifikop_operator_reconcile_errors_total{controller="NifiCluster"} 0
nifikop_operator_reconcile_errors_total{controller="NifiClusterTask"} 0
nifikop_operator_reconcile_errors_total{controller="NifiDataflow"} 0
nifikop_operator_reconcile_errors_total{controller="NifiNodeGroupAutoscaler"} 0
nifikop_operator_reconcile_errors_total{controller="NifiParameterContext"} 0
nifikop_operator_reconcile_errors_total{controller="NifiRegistryClient"} 0
nifikop_operator_reconcile_errors_total{controller="NifiUser"} 0
nifikop_operator_reconcile_errors_total{controller="NifiUserGroup"} 0
# HELP nifikop_operator_reconcile_operations_total Total number of reconcile operations
# TYPE nifikop_operator_reconcile_operations_total counter
nifikop_operator_reconcile_operations_total{controller="NifiCluster"} 0
nifikop_operator_reconcile_operations_total{controller="NifiClusterTask"} 0
nifikop_operator_reconcile_operations_total{controller="NifiDataflow"} 0
nifikop_operator_reconcile_operations_total{controller="NifiNodeGroupAutoscaler"} 0
nifikop_operator_reconcile_operations_total{controller="NifiParameterContext"} 0
nifikop_operator_reconcile_operations_total{controller="NifiRegistryClient"} 0
nifikop_operator_reconcile_operations_total{controller="NifiUser"} 0
nifikop_operator_reconcile_operations_total{controller="NifiUserGroup"} 0

@mh013370 mh013370 marked this pull request as ready for review August 25, 2022 14:57
@mh013370
Copy link
Member Author

I've just realized that almost all of these metrics are already provided by the controller runtime. I'm going to close this PR and re-scope the ticket.

We might still be able to add metrics capturing how many of the CRDs the operator is watching, but that will be a separate PR.

@mh013370 mh013370 closed this Aug 25, 2022
@mh013370 mh013370 mentioned this pull request Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator Metrics
1 participant